Day22 Airbnb in Berlin 3/5 the ring zone 柏林Airbnb 3/5 蛋黃區

第 11 屆 iThome 鐵人賽

DAY 22

AI & Data

Hands on Data Cleaning and Scraping 資料清理與爬蟲實作系列第 22 篇

11th鐵人賽 pandas airbnb berlin data visualization

kyt

2019-09-23 19:09:08

1030 瀏覽

分享至

到柏林旅遊，會發現市區交通票券由放射狀分為A、B、C三個區塊，想買長期票券分法只有A+B區、B+C區、A+B+C區，以一般旅遊民眾而言，移動範圍多會在A+B區，網路上沒有找到相關A+B區詳細劃分的資料，但可以這個柏林低排放區為依據，在這個區域的房源交通易達度較高，故只取出這些地區的房源做分析。這個網站由於柏林以大約S-Bahn路面輕軌環狀電車以內規劃為低排放區，姑且就稱之為柏林蛋黃區XD。

The transportation in Berlin is devided into A, B, and C three zones. But if you want to buy a longterm ticket, you can only choose between A+B zone, B+C zone, and A+B+C zone. I couldn't find any data that specify the border of three zones but I found this Low-emission Zone Area instead. The low-emission zone covers the centre of Berlin inside the S-Bahn ring so we will analyse the listings in this area.

載入常用套件並讀入我們要分析的資料

First, we need to import the packeges we need and read in the data we are about to analyse.

# 載入所需套件 import the packages we need
import pandas as pd 
import numpy as np

ringzipcode = pd.read_csv('airbnb/ring_zipcode.csv') # 讀入ring_zipcode.csv檔案 read in the ring_zipcode.csv file
ringzipcode.columns = ['zipcode', 'info'] # 改下欄位名 change the name of the columns
ringzipcode.head(10) # 讀取前十筆資料看一下 call the top 10 rows to take a look

ringzipcode = np.array(ringzipcode)
ringzipcode_list = ringzipcode.tolist()
print(ringzipcode_list[:10]) # 印出前10筆看看 print out the top 10 list

# 我們只要存整個區域在環狀輕軌電車內的郵遞區號 we only want the postcode of areas that's in the S-Bahn ring
rz = []
for i in ringzipcode_list:
    if i[1] == 'innerhalb':
        rz.append(i[0])
print(rz)

import warnings # 忽略警告訊息 
warnings.filterwarnings("ignore")

# 讀入listing檔案來分析 Read in the listing file
listing = pd.read_csv('airbnb/listings.csv') # 讀入listing檔案來分析 read in the listing file
print('There are', listing.id.nunique(), 'listings in the listing data.')
listing.info() # 查看資料細節 the info of data
listing.head(3) # 叫出前三筆資料看看 print out the top three rows of data

abzone = listing["zipcode"].isin(rz)
ablisting = listing[abzone]
ablisting.info()
print('篩過郵遞區號後少了一半以上的房源數量。')
print('The listing counts become less than a half after filtered with postcode in the S-Bahn ring area.')

# 篩選後數量前10名不一樣了，沒想到有些偏遠地區房源數量很多
# top 10 listings changed after filtering, didn't know that there were many listings in the regional area.
grouped_ab_df = ablisting.groupby('neighbourhood_cleansed').count()[['id']].sort_values('id', ascending=False).head(10) 
grouped_ab_df

grouped_ab_df.index

# 房源數量前10的區域 The areas with the top 10 listings
top10 = []
for i in range(10):
    top10.append(grouped_ab_df.index[i])
print(top10)

ablisting_iftop10 = ablisting["neighbourhood_cleansed"].isin(top10)
ab_top10_listing = ablisting[ablisting_iftop10]
ab_top10_listing.info()

ab_top10_listing.head()

# 把位於蛋黃區房源數量前10名的存成一個新的csv檔，明天來好好分析 
# save the top 10 listings areas within the low-emission zone for further analysis
ab_top10_listing.to_csv('ab_top10_listing.csv')

本篇程式碼與範例檔案請參考Github。The code and example files are available on Github.

文中若有錯誤還望不吝指正，感激不盡。
Please let me know if there’s any mistake in this article. Thanks for reading.

Reference 參考資料：

[1] Inside Airbnb

[2] 利用Airbnb來更了解居住城市，以臺北為例 Python實作（上）

[3] Airbnb listings in Berlin

[4] Low-emission Zone Area

[5] How To Use The Berlin Public Transport Without A Fine